Principal component analysis combined with truncated-Newton minimization for dimensionality reduction of chemical databases

نویسندگان

  • Dexuan Xie
  • Suresh B. Singh
  • Eugene M. Fluder
چکیده

The similarity and diversity sampling problems are two challenging optimization tasks that arise in the analysis of chemical databases. As a first step to their solution, we propose an efficient projection/ refinement protocol based on the principal component analysis (PCA) and the truncated-Newton minimization method implemented by our package TNPACK (PCA/TNPACK). We show that PCA can provide the same initial guess as the singular value decomposition (SVD) for the optimization task of solving the distance-geometry optimization problem if each column of a database matrix has a mean of zero. Hence, our PCA/TNPACK approach is analogous to the SVD/TNPACK projection/refinement protocol that we developed recently for visualizing large chemical databases. Using PCA/TNPACK and the Merck MDDR database (MDL Drug Data Report), we further investigate the projection/refinement procedure with regards to the preservation of the original clusters of chemical compounds, the accuracy of similarity and diversity sampling of chemical compounds, and the potential application in the study of structure activity relationships. We also explore by simple experiments accuracy and efficiency aspects of the PCA/TNPACK procedure compared to those of a global optimization algorithm (simulated annealing, as implemented by the program package SIMANN) in terms of producing the projection mapping of a database. Numerical results show that the 2D PCA/TNPACK mapping can preserve the distance relationships of the original database and is thus valuable as a first step in similarity and diversity applications. Of course, the generation of a global rather than local minimizer and its interpretation in terms of pharameceutical applications remains a challenge. Since all numerical tests are performed on the Merck MDDR database, results are representative of realistic cases encountered in the field of drug design, and may help analyze properties of medicinal compounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره

In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...

متن کامل

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

Visualization of Chemical Databases Using the Singular Value Decomposition and Truncated-Newton Minimization

We describe a rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) as a rst step in chemical database analyses and drug design applications. The compounds in the database are described as vectors in the high-dimensional space of chemical descriptors. The algorithm is based on the singular value decomposition (SVD) combined with a minimization procedure ...

متن کامل

An Efficient Projection Protocol for Chemical Databases: Singular Value Decomposition Combined with Truncated-Newton Minimization

A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemente...

متن کامل

A novel dimensionality reduction technique based on kernel optimization through graph embedding

In this paper, we propose a new method for kernel optimization in kernel based dimensionality reduction techniques such as Kernel Principal Components Analysis (KPCA) and Kernel Discriminant Analysis (KDA). The main idea is to use the graph embedding framework for these techniques and, therefore, by formulating a new minimization problem to simultaneously optimize the kernel parameters and the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Program.

دوره 95  شماره 

صفحات  -

تاریخ انتشار 2003